[AHM] Prepare election-provider-multi-block for full lazy data deletion#8304
[AHM] Prepare election-provider-multi-block for full lazy data deletion#8304kianenigma merged 4 commits intomasterfrom
election-provider-multi-block for full lazy data deletion#8304Conversation
election-provider-multi-block for full lazy data deletionelection-provider-multi-block for full lazy data deletion
|
I will let @niklasad1 to proper comment on the actual changes 😅 I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead. Is the one below a fair summary for miner's Point of View? Current behavior:The staking miner needs to be careful about previous submissions potentially interfering with new ones during round transitions. When a new election round starts, the miner might need to explicitly clear or withdraw previous solutions (see paritytech/polkadot-staking-miner#1029). New behavior with round-indexed storage:
Or do I get it wrong, @kianenigma ? |
|
This shouldn't be an issue for the staking-miner rather than that we have to update the metadata and it only affects that multi-block verifier miner doesn't have to clean up old data? Maybe worse db migrations etc You got my 👍 |
The signed phase hasn't really changed in this PR -- the deletion of data was lazy, and it will remain as it was before.
So yea, this is not a correct statement. The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it. |
Great, thank you, makes sense! |
@kianenigma , got it. Not for now but in a distant future, what does prevent us to move the responsibility to clear / reclaim the deposit from the miner to the chain / pallet ? In theory when you do the lazy deletion and everything, you will have all the information about discarded solutions or not? |
Yeah we could, but the problem is that when the pallet logic force-deletes everything all at once, you end up with a lot of storage operations in a single block. The blockchain programming model is kinda uncomfortable here, but we have some tools for it:
|
`desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304).
`desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304).
* Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt
…tion (paritytech#8304) Follow-up to: paritytech#8127 This PR makes all the hefty storage items in `multi-block` and `multi-block::verifier` to be keyed by a round index (note that the same has already been done for `multi-block::signed`). This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via `#[pallet::task]`, or a free extrinsic. As per paritytech#8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it. I am raising this in advance because 1. It will prevent a data migration in the past. Migrating a Map to `DoubleMap` and so on in transit is a pain 2. The `polkadot-staking-miner` better adapt itself already to read these storage items in the new format A follow-up may replace `OnRoundRotation` with `()`, and add the transactions/`#[pallet::task]` needed to do the deletion lazily. cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner. --- ## Reflection In the past months, I have been working on two pallets: 1. `pallet-staking-async`, a lot of code for which I have written in 2019/2020, with little experience in FRAME. 2. `pallet-election-provider-multi-block`, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way. And nowadays, I can feel the difference very clearly. Adding anything in `pallet-staking-async` seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down _somewhere_ but are not particularly enforced. Contrary, adding a new feature to `pallet-election-provider-multi-block`, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking. And I think this boils down to **two practices** that I deployed in the latter, but didn't have the expertise to do in the former: 1. **Composite storage items**: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a `fn mutate_checked` and/or asserting it in a `fn try_state` that is executed in all unit tests. 2. I spent almost as much time and energy in developing the `mock.rs` as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their `mock.rs`, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future. I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3
* master: (120 commits) [CI] Improve GH build status checking (#8331) [CI/CD] Use original PR name in prdoc check for the backport PR's to the stable branches (#8329) Add new host APIs set_storage_or_clear and get_storage_or_zero (#7857) push to dockerhub (#8322) Snowbridge - V1 - Adds 2 hop transfer to Rococo (#7956) [AHM] Prepare `election-provider-multi-block` for full lazy data deletion (#8304) Check umbrella version (#8250) [AHM] Fully bound staking async (#8303) migrate parachain-templates tests to `gha` (#8226) staking-async: add missing new_session_genesis (#8310) New NFT traits: granular and abstract interface (#5620) Extract create_pool_with_native_on macro to common crate (#8289) XCMP: use batching when enqueuing inbound messages (#8021) Snowbridge - Tests refactor (#8014) Allow configuration of worst case buy execution weight (#7944) Fix faulty pre-upgrade migration check in pallet-session (#8294) [pallet-revive] add get_storage_var_key for variable-sized keys (#8274) add poke_deposit extrinsic to pallet-recovery (#7882) `txpool`: use tracing for structured logging (#8001) [revive] eth-rpc refactoring (#8148) ...
…tion (#8304) Follow-up to: #8127 This PR makes all the hefty storage items in `multi-block` and `multi-block::verifier` to be keyed by a round index (note that the same has already been done for `multi-block::signed`). This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via `#[pallet::task]`, or a free extrinsic. As per #8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it. I am raising this in advance because 1. It will prevent a data migration in the past. Migrating a Map to `DoubleMap` and so on in transit is a pain 2. The `polkadot-staking-miner` better adapt itself already to read these storage items in the new format A follow-up may replace `OnRoundRotation` with `()`, and add the transactions/`#[pallet::task]` needed to do the deletion lazily. cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner. --- ## Reflection In the past months, I have been working on two pallets: 1. `pallet-staking-async`, a lot of code for which I have written in 2019/2020, with little experience in FRAME. 2. `pallet-election-provider-multi-block`, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way. And nowadays, I can feel the difference very clearly. Adding anything in `pallet-staking-async` seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down _somewhere_ but are not particularly enforced. Contrary, adding a new feature to `pallet-election-provider-multi-block`, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking. And I think this boils down to **two practices** that I deployed in the latter, but didn't have the expertise to do in the former: 1. **Composite storage items**: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a `fn mutate_checked` and/or asserting it in a `fn try_state` that is executed in all unit tests. 2. I spent almost as much time and energy in developing the `mock.rs` as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their `mock.rs`, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future. I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3
* Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt
* tests: add e2e tests for multi-block miner * tweak config files for gha * fix fmt * Fix snaphot fn after metadata update (#1037) * Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt * chore(deps): bump actions/download-artifact from 4.2.1 to 4.3.0 (#1039) Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.2.1 to 4.3.0. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v4.2.1...v4.3.0) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: 4.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Do not build omni-node * Check RPC port for the parachain collator before running the miner Do not rely on specific text in polkadot logs but on the parachain collator to be up and running on the RPC port specified on the zombienet configuration. * Fix wrong page index while checking page submissions * Test successful <=> users is rewarded after score and all pages are submitted * Cleanup * Improve error handling * Add a TODO around proper zombienet cleanup --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Paolo La Camera <paolo@parity.io> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Follow-up to: #8127
This PR makes all the hefty storage items in
multi-blockandmulti-block::verifierto be keyed by a round index (note that the same has already been done formulti-block::signed).This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via
#[pallet::task], or a free extrinsic.As per #8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it.
I am raising this in advance because
DoubleMapand so on in transit is a painpolkadot-staking-minerbetter adapt itself already to read these storage items in the new formatA follow-up may replace
OnRoundRotationwith(), and add the transactions/#[pallet::task]needed to do the deletion lazily.cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner.
Reflection
In the past months, I have been working on two pallets:
pallet-staking-async, a lot of code for which I have written in 2019/2020, with little experience in FRAME.pallet-election-provider-multi-block, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way.And nowadays, I can feel the difference very clearly. Adding anything in
pallet-staking-asyncseems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down somewhere but are not particularly enforced.Contrary, adding a new feature to
pallet-election-provider-multi-block, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking.And I think this boils down to two practices that I deployed in the latter, but didn't have the expertise to do in the former:
fn mutate_checkedand/or asserting it in afn try_statethat is executed in all unit tests.mock.rsas I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to theirmock.rs, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future.I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3